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ABSTRACT 

A study was undertaken to (1) examine the development 
and construction of a Group Informal Reading Inventory to predict the 
reading comprehension levels (independent, instructional, and 
frustration) of junior high school bilingual students for the purpose 
of reading instruction; and (2) validate the inventory through a 
three-way correlational study cc**;->> - Ing the comprehension results 
with those of a cloze test, a standardized test, and a questionnaire 
by which teachers estimate students* reading levels. The study 
involved 50 bilingual students of predominantly English- and 
Spanish-speaking, low- and middle-income backgrounds in an urban 




far below their developmental grade levels and their assigned present 
grade levels, and native language grades were lower than those in 
English. It is recommended that (1) a decision be made for each 
individual student as to whether he should be taught in two languages 
or, if his native language skills are insufficient to transfer to 
English as a second language, whether he should be taught in English; 
(2) testing for reading and content areas be administered regularly 
to monitor progress; and (3) there be careful regulation of the 
timing, techniques, content, materials, and evaluation of bilingual 
instruction. (MSE ) 
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THE EFFECTIVENESS OF AN INFORMAL READING INVENTORY 
IN IDENTIFYING THE FUNCTIONAL READING LEVELS 
OF BILINGUAL STUDENTS* 



Marie Lombardo 

INTRODUCTION 

The main purpose of this study is twofold: 

1. The development and construction of an original Group 
Informal Reading Inventory (GI$I) in predicting the 
comprehension reading levels (Independent, Instruc- 
tional , and Frustration) of junior high bilingual stu- 
dents for the purpose of reading instruction. 

2. The validation of the GIRI through a three-way corre- 
lational study that will compare the comprehension 
results of the GIRI with those of a cioze Test t a 
Standardized test ( the Stanford Diagnostic Reading 
Test, 1 976) , and a questionnaire in which the tea- 
chers estimate students' reading levels. 

(For the data on validation of the GIRI , the reader may con- 
sult the original dissertation study.) 



*This paper is based on the author's doctoral dissertation en 
titled, "The Construction and Validation of the English as 
a Second Language Assessment Battery: The Receptive Area," 
Boston University, 1979. 



Justification 

In recent years, it has been reported that in spite of the 
fact that bilingual programs do exist for bilingual students, 
these students are still performing below grade level (Herbert, 
1977) and, consequently, many become frustrated and eventually 
drop out of school (United States Commission on Civil Rights, 
1971). The frustration that these students encounter is based 
upon the fact that: (a) their reading levels are not properly 
assessed, their individual reading needs are not met, and as 
a result they are presented with materials that are too diffi- 
cult for them; and (b) students are often grouped for reading 
according to standardized test results. These results are not 
accurate because standardized measures tend to overestimate 
the reading levels of students, thus placing these students 
at their frustration rather than their instructional reading 
level (Wiechelman, 1971; Motta et ai., 1974). Also, standard- 
ized measures may not be appropriate if they were not designed 
for the bilingual students to be assessed: 

When standardized measures are used, results 
may be reasonably reliable and valid, but 
interpretation of the individual stuaent's 
performance may not be possible if people 
like the student were not part of the group 
on which the test was normed. Using a stan- 
dardized test to assess reading ability for 
instance, is not appropriate when the student 
is not a native speaker of English. While 
the test might be used to compare the per- 
formance of a foreign-born bilingual with 
American monrlinguals for diagnostic purpose, 
the resulting score is not a meaningful esti- 
mate of reading ability of the bilingual 
student. (Morishima and Mizokawa, 1977, 
p. 2) 
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Condon (1975) adamantly states that little research has 
been conducted in the United States relating to the influence of 
cultural factors on the fairness of the assessment of culturally 
different students. She purports that most of the problems 
raised in assessment have come from bilingual or compensatory , 
programs, which require the use of standardized tests to mea- 
sure the functional progress of bilingual students. According 
to Condon (1975), the problem is the contextual misconstruction 
of items; there is a neglect of culture, misinformation,- an 
erroneous presentation of data, and finally bias, or the presen- 
tation of distorted impressions of the foreign culture. As a 
result, bilingual students are unfairly assessed in terms of 
the content' of these measures and the uses to which their re- 
sul ts are appl 1 ed . 

Surely, the case for reading assessment instruments to be 
specifically designed for bilingual students is apparent. How- 
ever, to place the need for such instruments in proper perspec- 
tive, the evolution of this problem in bilingual education must 
be presented in an historical overview. 

Historically, the United States has had many changes in its 
language policies. Until 1880, there was no language policy 
enforcing the sole use of English. There was tolerance for the 
pupils* primary languages (Spanish, C i, Portuguese, French, 
Dutch, and Basque). (Anderson and Bo:er, 1970) 

At about 1917, with the occurrei.ee of- World War I, other 
languages were considered a threat to nationalism. As a re- 
sult of the war, many people from southern Europe immigrated 



4 



to the United States. During this period, states such as 
Connecticut and Massachusetts began enforcing the "use of 
English" policies and the requirement of literacy before voting 
(passed in 1855 and 1857, respectively). But the English-only 
policy, also known as the "melting pot idea," became intensified 
during World War II. This concept was not altered until 1954 
with the Brawn v. so&rd of Education Civil Rights decision, 
which called for equal educational opportunities for all races. 
The most important impact of the case was the fact that it 
agitated minority groups to also demand social and political 
opportunities. Through the 1960*5 evidence (Kobrick, 1972; 
The way ws Go to school, 1970) collected indicated that the 
traditional English monolingual educational system did not 
meet the needs of over 2.5 million students of language back- 
grounds other than English. 

In 1968, the Bilingual Education Act was passed as part 
of ESEA. This act, later revised in 1974, stated that the 
educational program allow non-E/igl i sh- speaking children to 
be taught in English and the native language so as to facili- 
tate their progression through school. Language is the key 
word in this act. The factor of language was further con- 
sidered in court cases, whereby parents wanted to ensure that 
their children were taught the native lanuage before grade 
eight. However, it was not until the Lau v. Nichols (1974) 
case that guidelines for bilingual education were established. 
This case was filed in San Francisco on behalf of the 1,800 
Chinese students who were not receiving appropriate las.guage 
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assistance to enable them to compete on equal grounds with 
their English-speaking classmates. The Department of Health, 
Education, and Welfare declared that the recipients of federal 
financial aid could not "restrict an individual in any way in 
the enjoyment of any advantage or privilege enjoyed by others 
receiving any services, financial aid, or any other benefit 
under the program" (lau v. Nichols, 1974, p. 566). 

To comply with the decision of the Court, the San Francisco 
Unified School District, with a citizen's task force, designed 
guidelines for school districts to follow in the case of stu- 
dents "whose home language is other than English" (Office of 
Civil Rights, 1975). The guidelines known as the "Lau Remedies" 
included the i dentif icati on of the student's: 

1. Language dominance according to five categories: 

a. Monolingual speaker of the language other than 
English (speaks the language other than English 
exclusively). 

b. Predominantly speaks the language other than English 
( speaks mostly the language other than' Engl i sh , but 
speaks some English). 

c. Bilingual (speaks both the language other than English 
and English with equal ease). 

d. Predominantly speaks English (speaks mostly English, 
but some of the language other than English). 

e. Monolingual speaker of English (speaks English ex- 
clusively). (Lau Remedies, Office of Civil Rights, 
1975, p. 2) 

2. Frequency of use. 

3. Diagnosis and prescriptive approach for proposing an 
educational program: 

a. At the elementary and intermediate levels the pro- 
program may be .transitional , bi 1 ingual/bi cul tural , 
or multil ingual /mul ticul tura". . 



b. At the secondary level the program may be bilingual ; 
transitional, ESL or any of the above mentioned 
combinations . 

4. An outline of requirements of personnel teaching. 

5. A system for notifying the parents of the student's 
program. 

6. A way to evaluate the program. (Lau Remedies, Office 
of Civil Rights, 1975, pp. 2-3) 

Although these remedies were certainly a step in the right direc/ 
tion, the problem 1s that once students are selected according 
to the remedies (categories a, b, or c) and placed in a pro- 
gram, it is not required that their language competence 1n 
language (listening, speaking, reading, and writing) be assessed. 
However, from an educator's standpoint, it is clear that assess- 
ment should be required. More specifically, there 1s a pressing 
need for reading instruments to be developed and validated 

especially for bilingual students taking into account their 

/ 

interests and backgrounds in the assessment of their functional 
reading levels (levels at which each of the students can func- 
tion adequately in the classroom). As will be demonstrated, 
informal reading inventories ( IRI ) are proposed to meet such 
a need. 

The need for IRI's in assessing students' functional 
reading levels for the purpose of grouping and of matching 
materials to students' needs in monolingual and bilingual class- 
rooms has been the most urgent request of reading authorities. 
Harris ("5961 } claimed that the use of inappropriate reading 
materials was the most frequent cause of reading difficulties 
faced by experienced teachers. Dechant (1970) declared that 
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10-15 million children in regular classrooms attempted to read 
books that were too hard for them; and that the use of inappro- 
priate textbooks interfered with reading progress. 

The claim that standardized tests could give the instruc- 
tional levels for individual children was disputed (Smith, 
1961; Tucker, 1975). If teachers did careful readability 
studies of the materials to be used in the classroom and de- 
vised IRI's and administered these, they would discover the 
instructional reading levels of their students. They could 
then match materials to student levels and needs, thus making 
instruction more effective. 

In an attempt to comply to the need for reading instru- 
ments, this study will attempt to propose and validate an 
original Group Informal Reading Inventory (QIRI ) for bilingual 
students . 

REVIEW OF THE LITERATURE 

In reviewing the existing literature, it is apparent that 

IRI's constitute a controversial issue. Researchers have focused 
on different aspects of IRI's and have formulated conflicting 
opinions on the definition, construction, administration, 
and scoring; But, in spite of these disputed areas, researchers 
are in unison regarding its purpose, need, and validity. For 
the purpose of this study, the areas of conflict will first be 
described followed by the purpose, need, and the validity of 
IRI's, when they are compared with the cloze and standardized 
tests , 



In Identifying the first conflict area--that of defining 
the I R I - - 1 t is agreed by most reading authorities that IRI 1 s 
are informal diagnostic measures because usually they are neither 
normed nor standardized (Johnson and Kress, 1965). More spe- 
cifically, Johnson and Kress describe an IRI as: (a) infernal 
because norms have not been established and the performance of 
one student is not judged against that of others, but against 
some standard of mastery; (b) reading because it evaluates 
the student's ability to manipulate ideas represented by words 
in the receptive and expressive language areas; and (c) in- 
ventory because it reports the students' complete comprehen- 
sive performance in reading, language, and thinking skills. 

Although an IRI is defined as a non-standardized reading 
test through which a student's reading performance is evaluated 
against predetermined standards (McCracken , 1967), researchers 
(McCracken, 1964, 1970; Botel , 1961; Silvaroli, 1965) have de- 
veloped and standardized their own IRI's by establishing the 
reliability and validity of their tests through various studies. 

McCracken (1964) conducted a study to validate his "Stan- 
dard Reading Inventory," which is based on the vocabulary of 
three basal readers, and tested the validity of its passages 
with well-known readability formulae. He then improved the 
content validity of the inventory by controlling the vocabu- 
lary, sentence length, content, and style of the reading 
selections and obtained norming data by administering these 
oral reading selections to 664 students in grades 1-6. The 
significant differences found in student performance as para- 

\ „.«"■*-* 
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graphs increased in difficulty was quite substantial. Since 
there were alternate forms of the inventory, reliability was 
obtained by having two examiners administer each form to 60 
elementary school children. Reliability correlations between 
the two forms (for the Independent, Instructional, and Frus- 
tration Reading Levels) ranged from .86 to .91- The correlations 
between the two forms for the eight reading sub-skills measured 
by the inventory ranged from .68 for Word Recognition errors 
to .99 for Vocabulary in isolation. It was determined by the 
results of this study that the "Stanford Reading Inventory" 
reliably estimates students' functional reading levels. 

Botel (1961) cross-validated his reading test and read- 
ability measures through the use of correlational and matching 
procedures. The correlations among all the tests and read- 
ability measures were unusually high for both McCracken * s 
and Botel. 1 s tests. In a later study, McCracken and Mullen 
(1970), further val idated the IRI's by compiling the data 
from the administration of the "Standard Reading Inventory" 
(SRI), the "Botel Inventory," and the Stanford Achievement 
Tests to 171 male and female students from grades 1-6. Re- 
sults indicated that there was concurrent validity between 
the mean levels Of the Stanford Achievement Tests and the 
instructional levels obtained from the SRI and the Botel In- 
ventory. The data also confirmed that the instructional reading 
levels wan be measured validly. 

Most researchers, however, have not advocated standard- 
ization of IRI's. In addition to the explanation of IRI, it 
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1s necessary to delineate Its purpose, which 1s three-fold: 
(a) to assess the student's functional level and the student's 
strengths and weaknesses for purposes of Instruction (Johnson 
and Kress, 1965); (b) to estimate the Independent, Instructional, 
and frustration reading level for each student so that reading 
materials can b matched accordingly (Betts, 1957; Botel, 1969; 
Beldin, 1970; ipay, 1964; Render, 1968); and (c) to serve as 
a placement ir itrument in grouping students according to their 
appropriate reading levels (Walter, 1974; Betts, 1940; McCracken, 
1967). Both Walter (1974) and Pilulski (1974) claim that there 
1s no need to standardize IRI's since there is face validity 
in IRI's when their passages and questions are sampled from the 
materials used, or be to used, in the classroom. 

In addressing the second area of conf 1 ict--construction of 
IRI ' spreading authorities have been divided in their opinions; 
some have utilized and advocated the use of classroom materials, 
others have created their own reading selections for IRI's. 
Motta et ai . (1974)*advocate that in preparing IRI's, the 
materials to be used in the curriculum should be utilized. 
They warn that in selecting these materials, one should take 
into account the elements of interest, culture, and language 
structure in dealing with non-native speakers. They especially 
emphasize that an analysis of semantics, lexicons, and syn- 
tactical structure should be conducted in order to ensure the 
appropriateness of these materials for non-native speakers. 

On the other hand, resaarchers--McCracken (1964), Botel 
(1961 5, and O'Brien ( 1 970) --have reviewed basal readers and 
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found it roost feasible to create original IRI's based on their 
subjects' interests. Otto at *i.(1973), also agreed that an 
IRI does not have to be constructed from the basal reader 
series for students of the upper elementary and secondary 
level s. 

O'Brien (1970) conducted a study to confirm that her orig- 
inal IRI * s were more effective than those whose passages were 
directly extracted from classroom basal readers. She studied 
traditional IRI 1 s and suggested a new method for devising them 
in which words are ta u from basal readers and incorporated 
into an original paragraph. To discover the usefulness of this 
procedure, a traditional IRI and one based on the new method 
were administered to a group of second and fourth graders. 
The results indicated that the new IRI 1 s : (a) presented fewer 
words per selection, (b) presented more new words in each 
selection, (c) required fewer se'ections to be read, and (d) 
gave an instructional level score (in ten cases) lower than 
did the traditional IRI. 

Once the context for the IRI has been selected, the next 
consideration is the type of IRI to be constructed. Types vary 
from these based on graded word lists (Silvaroli, 1965; Motta 
et ai., 1974) to a series of passages and corresponding com- 
prehension questions (McCracken, 1964; Bote!, 1961, Silvaroli, 
1965). Based on research and sensitivity to bilingual students' 
needs, the researcher decided to create original stories for 
the purpose of the study. The advantages of the stories are: 
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1. Context was provided to facilitate reading. Thorn's 
(1976) contends that it is simpler for speakers of 
other languages to read vocabulary that represents 
content. 

2. Original themes were employed to motivate and interest 

T 

the non-native speakers. 

3. Lexicon (vocabulary), semantics (meaning), and syntax 
(grammatical structures) were manipulated and adjusted 
for these students. 

4. Dialogue was used as context in the stories so that 
students could be presented with a natural transition* 
from oral to written language (Ruddell , 1965). 

When one wishes to construct the story type of IRI , Otto 
et ai. (1973) recommend the following passage lengths: (a) pas- 
sages of 30-100 words followed by five comprehension questions 
for the primer level, (b) passages of 250 words followed by six 
comprehension questions for second and third grade levels, and 
(c) passages of 150 words followed by eight comprehension ques- 
tions for levels above fourth grade. The reason that passages 
are increased iiv-length are that students of limited-English- 
speaking ability should not be frustrated by long selection. 
However, longer passages are needed to facilitate the construc- 
tion of higher level questions. For the purpose of this study, 
the story type of IRI was constructed. The number of words 
for the passages and thw number of comprehension questions 
were increased at each level because the GIRI was intended for 
secondary school students. 
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After the passages have been constructed, readability indices 
are computed for each passage. Measures of readability range 
from the Spache (1953), which is intended for the third grade 
level and oelow; the Dale and Chall (1948), which presupposes 
a reading level of fourth grade; the Flesch (1948), intended 
for grades 4-16; the Fry (1968) for grades 1-13; and the Smog 
(cited in Yaughan, 1976) for grades one through college. Pauk 
(cited 1n Vaughan, 1976) tested the Smog, Dale-Chall , and Fry 
readabilities on the same passages of 20 articles and found 
that: (a) the Dale-Chall and Fry scores were in agreement, 
(j) the Smog scores disagreed with the Fry and Dale-Chall, 
and (c) the Smog scores tended to be two grade levels above 
the Dale-Chall and Fry in estimating the grade levels of the 
passages. Based on these conclusions, the researcher decided 
to use the Fry for the primer to fourth grade passages, and 
the Dale-Chall for passages above the fourth grade level. The 
rationale for including passages as low as the primer level 
was to help the bilingual student meet with initial success. 
Thonis (1976) indicated that pupils who have limited-English- 
speaking abilities need materials with a lower readability 
rating (about two to three years) below that of English 
speakers. 

Once the grade level of each passage has been determined, 
comprehension questions are prepared. The questions deal with 
the cognitive and affective domains so that the students* 
knowledge and personalized perspective based on their individual 
experiences is examined. For each passage: (a) questions 
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dealing with the cognitive domain were hierarchically arranged 
from literal to synthesis levels of comprehension, according . 
to Bloom (1956} and Barret's (cited in Lapp and Ramsey, 1976) 
taxonomies; and (b) questions dealing with the affective domain 
ranged from simple awareness of situations to the complex de- 
termination of one's own values and philosophy of life, accor- 
ding to Krathwohl ' s et ai . taxonomy (1964). 

The questions were constructed according to Lindvall's 
(1967) model of Bloom's Taxonomy, which delineates the testing 
objectives and the most expedient way ta elicit responses. The 
model outlines and. examines the levels as follows: 

1. The knowledge level deals with the student's ability 

to recall terms, facts, rules, principles, and other generaliza- 
tions. The testing objectives are for the student to be able 
to name, list, state, describe, and define material presented. 

2. The comprehension level is concerned with the pupil's 
ability to understand a given content, put it in the pupil's 

own words, summarize it, and explain it. The. testing objectives 
are for the student to be abTe to translate, give examples, 
illustrate, interpret, summarize, and explain given materials. 

3. At the application level, the focus is on the student's 
ability to use rules, methods, procedures, principles, and other 
types of generalization to produce or explain given consequences 
or to predict the results of a given situation. The testing ob- 
jectives are to be able to solve, predict, develop, explain, 

and apply knowledge in a given situation. 
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4. At the synthesis level, the pupil's ability to develop, 
create, or produce something is tested. Specific testing ob- 
jectives are to develop a plan, write a paper, produce, create, 
or demonstrate a particular work (LiJidval 1 , 1967, pp. 19-20). 

For the intensive purpose of this study and fulfillment of 
the testing objectives at each level of cognition, multiple- 
choice items were constructed for the following reasons: (a) 
to provide a stimulus in order to evoke recall, (b) to facili- 
tate testing since it is easier for non-native speakers of 
English to recognize and identify rather than retrieve infor- 
mation, (c) to evaluate a greater variety of abilities, (d) to 
reduce guessing by providing several alternatives (Lindval 1 , 
1967), and (e) tc provide an objective method of scoring, a 
notion supported by Lowell (1969) who advocated that more atten- 
tion should be given to identifying reading performance in ways 
that do not rely upon the examiner's judgement. 

Once the IRI has been constructed, the third area of con- 
flict to be examined is that of administering the IRI. The 
issues involved are: (a) the technique to be adopted—whether 
students are to read the passages orally, silently, or listen 
to the teacher read then and then respond to comprehension ques- 
tions; (b) the procedure for administration—whether the test 
is to be individual or in group; and (c) the choice of the 
examiner to administer IRI. 

In addressing the problem of the technique for administering 
the IRI, it seems that researchers (Burmeister, 1974; Lowell, . 
1969; Dunkeld, 1970) were in unison that the value of employing 

17 



oral IRI technique above sixth grade is questionable. Dunkeld 
(l a 7Q) strongly contended that comprehension rather than word 
recognition scores obtained from oral reading seemed to be more 
valid indicators of passage difficulty above the fifth grade 
level. Furthermore, the technique of reading IRI * s orally has 
not been supported in literature for several reasons. There 
is a lack of uniformity among examiners as to whether passages 
should be read aloud or silently first; the problem is obvious, 
students reading the passages silently first would, of course, 
perform better. Also, the true word recognition score is not 
accurately identified after the student has read several passages 
orally. Dunkeld (1970) conducted a study and found that with 
the practice of reading several passages, student's word recog- 
nition scores increased. The problem is evident here; one 
examiner could choose the initial word recognition scores as 
indicators of the student's reading levels and another may 
assume the latter scores to be indicators of the student's 
reading levels. 

Along with the method of administering the oral IRI, the 
major controversy centers around the criteria for scoring oral 
reading errors. There is agreement on the behavioral character- 
istics to be considered, but there is disagreement as to. the 
definition of an oral reading error (Wflson, 1972). 

The behavioral characteristics are listed as mispronuncia- 
tions, substitutions, omissions, insertions, regressions, hesi- 
tations, and punctuation (Johnson and Kress, 1965; Burmeister, 
1974; McCracken, 1967). However, the meaning of these terms 
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varies from one examiner to the other. Because of this lack of 
agreement, both early and contemporary reading experts have 
placed less significance on reading errors as determiners of 
the students' reading levels. 

Early researcher (Betts, 1936) believed that the student 
could have faulty oral reading and still achieve comprehension. 
Later researcher (Powell, 1971) also found that errors in oral 
reading did not always affect comprehension. He studied the 
congruent validity of the Betts (1940) formula as far as the 
Word Recognition area was concerned, presuminy the Comprehen- 
sion Score of 70-75%. Powell (1969) argued that in spite of 
the fact that examiners used the same formula, discrepancies 
still arose. The problem was the procedural differences of 
examiners. Some examiners allowed students to read passages 
silently before reading them aloud, some had the students read 
only orally, others counted repetitions as errors. 

Powell (1969) went further to say that if the Comprehension 
Level of 70-75% remained, the student could tolerate whatever 
Word Recognition Errors that accompanied that performance level. 
He found that younger children could tolerate more Word Recogni- 
tion Errors than older children and still maintain the same 
level of comprehension. This would be true especially for 
non-English speakers; they could make numerous Word Recognition 
Errors--especially mispronunciations--and come out with a low 
Word Recognition Score; yet their comprehension could remain 
the same throughout different levels. 
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Powell (1971) further studied oral reading by analyzing 

five tests: Spache Diagnostic Reading Scales, Dxzrrell Analysi 
of Reading Difficulty , the Gilmore Oral Reading Test* the Gray 
Oral Reading Test, and the Gates McKillop Reading Diagnostic 
Test. Two IRI's were also studied: McCracken 1 s "Standard 
Reading Inventory" and Silvaroli's "Classroom Reading Inven- 
tory," Each test was analyzed for: (a) the number of words 
to be read in each pas-sage (a count was made); (b) the maxi- 
mum number of errors allowable for that passage to be regarded 
within limits of acceptable reading behavior from a given norm 
and (c) Word Recognition, which was computed for each passage 
by dividing the number of words to be used by the number of 
errors allowable. The results were as follows: (a) all Word 
Recognition error ratios increased in errcr latitude as the 
difficulty of the materials Increased and the age-grade of 
the sample increased; (b) one error in every 20 words (95% 
Word Recognition Accuracy) for determining Instructional Level 
is hardly justified because in an earlier study, Powell (1969) 
had discovered that some students could tolerate more oral 
reading errors than others and still comprehend. 

More recently, researchers (Goodman and Burke, 1972) from 
a linguistic background, have questioned the rating or oral 
reading errors on IRI's. They have considered examining oral 
reading miscues on the basis of semantic, lexical, or syntac- 
tical errors. They present a more general schemata of reading 
miscue errors, which are based on dialect, intonation, graphic 
similarity, sound similarity, grammatical function, correction 
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grammatical acceptability, semantic acceptability, and meaning 
change. Although this model qlf scoring oral reading has been 
criticized, it would seem to N more acceptable in purpose- 
fully identifying oral reading errors, especially for bilingual 
students; but for the purpose of this study, Goodman's Miscue 
Analysis (1972) was not usedjl 

Since the concern of the researcher was to define reading 
in terms of comprehension, oral reading did not seem appropriate 
for ascertaining the bilingual student's comprehension level. 
For the purpose of this study, silent reading was deemed appro- 
priate because the student would not be punished for pronun- 
ciation and accent. The student would be comfortable and would 
not feel embarrassed as in the case of oral reading, which many 
researchers (Otto et ai . , 1973; Lowell, 1969; Roswel 1 and Natchez, 
1964) have viewed as a trying task for some students. Students 
usually concentrate intensively on the decoding task and neglect 
comprehension. As a result, it woul d then be unfair ';o estimate 
their comprehension based on oral rejading. 

In administering the .PI, it w4s decided to allow students 
to ready silently without a time limit per passage, although a 
time limit of the testing session was established. It was 
hoped that through a general time limit, the examiner could 
determine which students needed training for speed. It was 
also hoped that administration of the silent reading IR I would 
determine: (a) which students needed work in grasping the 
meaning of written materials (Bolenius, 1919); (b) the compre- 
hension problems students were encountering; (c) the strengths 
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and weaknesses in the student's ability to follow directions, 
to read for understanding, and to use'context (Dechant, 1970; 
Wilson, 1972)- (d) specific skills in finding the main idea, 
character analysis, sequencing of details, making inferences, 
drawing conclusions, cause and effect, and higher-level ques- 
tions asking the student to apply concepts to himself/herself 
(Valmont, 1972). 

Asking the bilingual student, or any student for that matter, 
to apply concepts to his/her own experience helps make reading a 
more relevant experience. In fact, Knowles (1975) states that as 
a person grows, the concept of himself/herself moves from dependency 
to self-di rectedness. This self-direction is accompanied by a 
wealth of experience that can be a rich resource for learning 
and a readiness to learn, particularly those things that will 
help the student directly in daily problems. 

The comprehension questions in the stories ask the student 
to relate original themes to his/her own life experience. This 
should not only facilitate the testing of comprehension, but 
should also provide the student with a positive attitude toward 
reading--the ultimate goal of reading. 

Another facet of IRI's that has been controversial is the 
Word Recognition and Comprehension Scoring criteria to be used 
in determining the three functional reading levels: (a) Inde- 
pendent level, (b) Instructional Level, and (c) Frustration 

4 

Level. In studies since the 1940's, various criteria have 
been proposed for both word recognition and comprehension. 
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Early research in creating criteria for scoring IRI ' s was 
initiated by Killgallon and Betts (cited in Walter, 1974). Since 
Killgallon had worked closely with Betts at the Pennsylvania Uni- 
versity Reading Clinic, the scoring criteria developed became 
known as the "Betts-Klllgallon Criteria." The exact research 
base for the development of the criteria is not very well known, 
and for this reason they have been criticized. However, the cri- 
teria has generally remained applicable for IRI * s . 

Beldin (1970) suggested that the criteria was not arbitrarily 
established but was derived from a requirement of 50% comprehen- 
sion for the understanding of material (Bolenius, 1919), and 
that for the student to read with meaning, he/she should not 
have more than one difficult word in 20 continuous words (Durrell, 
1956). The Betts-Ki 1 1 gallon Criteria, which has been applied in 
most texts discussing IRI' s, provides a Basal Reading Level of 
90%, a Probable Instructional Level derived by a minimum compre- 
hension score of 50%, a Probable Reading Capacity Level of 752, 
and a Probable Frustration Reading Level attained by a compre- 
hension score of 20% or lower. 

Cooper (cited in Beldin, 1970) conducted a study with stu- 
dents of grades 1-6 to determine the relationship between the 
relevance of using symptoms of reading problems, demonstrated 
with certain materials, as a basis for predicting suitability 
of reading materials for the purpose of instruction; and to de- 
velop criteria that could be used to estimate the level of 
reading materials. Cooper used standardized tests and an IRI 
at the beginning. At the end of his six-month experimental 
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period he found that: (a) students who had made the most progress 
were those who had two or less errors 1n 100 running words of 
reading matter; (b) this group of students had been placed in 
appropriate instructional materials; and (c) primary students 
who made seven or more errors in 100 running words, and at the 
intermediate level, students who made 11 or more errors in 100 
running words were placed in materials that were not suitable for 
their instruction. Therefore, Cooper set Comprehension Level 
Scores at 70S for primary and 60$ for intermediate level stu- 
dents. 

McCracken (1967) provided detailed criteria for counting 

0 

Word Recognition errors, and he redefined the criterion levels 
as: (a) Independent Level, when the score of every passage has 
been rated as Independent Level; (b) Instructional Level, when 
half of the scores fall below the questionable half of the In- 
struction! Level; and (c) Frustration Level, when the scores 
of one passage is rated at the Frustration Level. The percen- 
tages proposed for Word Recognition are: (a) Independent 
Level: 99-100%, (b) Instructional Level: 95-98%, and (c) 
Frustration Level: 94% or less. For Comprehension, the scores 
are: (a) Independent Level; 90-100%, (b) Instructional Level: 
51-89%, and (c) Frustration Level: 50% or less. 

Kender (1968) compared the results of three instructional 
reading levels designated by three IRI's using three different 
criterion measures. The results of the three IRI's revealed 
significant differences among the means of a group of eighth- 
grade students. It seemed that a discrepancy among test re- 
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suits would have been obvious is tests, calculation of errors, 
scores, and criteria were all compared. However, amo*g the 
IRI's in this sttrdy, too many factors were being considred to 
make valid conclusions about them. 

Powell (1969) disputed the Setts-Kill gallon Criteria for 
scoring IRI's and said it was too high. He maintained through 
his own studies that pupils in grades one and two could toler- 
ate an 85$ Word Recognition Score and still comprehend 70% of 
the material; and students in grades three through six could 
attain a 91-94% score on Word Recognition, and comprehend 70% 
of the material at the same time. 

Ekwall et ai. (1973) conducted a study using a polygraph 
to validate the criteria for scoring IRI's. Their scope was 
to see if one set of criteria was more applicable to students 
of various mental ages, 'sexes, ethnic backgrounds, reading 
levels, and personality. To execute the study, they sampled 
150 students of the third, fourth, and fifth grades. The stu- 
dents were administered an IRI individually, and they were 
monitored and taped by the polygraph to note their behavior 
during oral reading and responding to comprehension questions. 
Results were analyzed, and it was found that: (a) there was 
a significant difference in polygraph-measured frustration 
reading levels for third, fourth, arid fifth graders; (b) there 
was no significant difference 1n polygraph-measured frustration 
reading levels between boys and girls; (c) there was no signi- 
ficant difference in polygraph-measured frustration reading 
levels among ethnic groups; (d) a comprehension criterion of 
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50% was adequate; (e) good readers become frustrated with fewer 
oral reading and comprehension errors than poor readers. In 
conclusion, Ekwall at al. did support Bett's Criteria for de- 
termining the Frustration Level of their students. 

In light of all the studies on scoring IRI's, for the pur- 
pose of this study the researcher has set the criteria of: (a) 
90-100% for the Independent Level; (b) 75-89% for the Instruc- 
tional Level; and (c) 0-74% for the Frustration Level in de- 
termining the comprehension levels of bilingual students. Motta 
at ai. (1974) employed a similar criteria for their non-English 
speakers . 

Another problematic area to be addressed is who is qualified 
to administer IRI's? The phllrsophy of IRI's contend that the 
IRI can best be utilized by the classroom teacher. However, 
research has indicated that classroom teachers need training in 
construction, administration, and scoring of IRI's in order to 
effectively utilize them in the classroom. Perhaps teachers 
who are Intimidated by their inadequate knowledge in readability, 
reading skills, taxonomies, methods of assessment, as well as 
prescription of remedi'tion for students, are hesitant about 
employing IRI's in the classroom to: (a) match reading materials 
to their students' levels, (b) diagnose students* reading strength 
and weaknesses, and (c) group students for reading Instruction. 

The following studies reported data on the teacher as the 
crucial factor in diagnosis, and the effectiveness of teachers , 
before and after receiving training in administering IRI's. 
Sipay (cited in Durkin, 1970) reported that one consistent 
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finding from research is that the teacher 1s tne crucial 
factor when student's achievement in reading 1s concerned. Dur- 
kin (1970) contended that research findings do not place enough 
emphasis on the teacher factor, and do not closely speculate 
what makes a successful or unsuccessful teacher. She questioned 
diagnosing reading levels until the teacher's knowledge of 
reading skills has been affirmed because the success or failure 
of diagnosis and interpretation of results for grouping and In- 
structing students is dependent upon the teacher's knowledge. 
Farr (1970) claimed that the validity of the IRI is highly de- 
pendent upon the ability of its instructor and administrator. 

Negative results have been reported with respect to tea- 
chers* ability to diagnose reading abilities before formal 
training. Emans (cited in Lowell, 1969) conducted a study 
rating 20 teachers 1n a graduate remedial reading course on 
their ability to distinguish reading skills needed by their 
pupils, whom they tutored one hour per day for five weeks. He 
found that tne teachers had preconceived notions as to what 
skills the students needed, and were not perceptive nor accurate 
in their diagnosis of students* abilities. 

Mills (cited in Lowell, 1969) reported that experienced 
classroom teachers were not awar? of the frustration reading 
levels among their students when they were asked to estimate 
the frustration levels by using IRI techniques. 

In order to determine whether teachers who had limited 
training in administering and scoring IRI's could assign appro- 
priate instructional levels to students as accurately as clini- 
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dans, Blynn (1970) designed a correlational study. She had 
alternate forms of the "Standard Reading Inventory," Forms A 
and B administered to 30 students by three pairs of teachers 
and clinicians. Ker findings indicated that differences in 
reader levels between pairs of examiners existed in 24 out of 
30 cases. This supported that fact that classroom teachers 
need training in administering and scoring IRI 1 s for assigning 
appropriate instructional levels to their students in grouping 
them for reading or meeting the students* instructional needs. 

On the other hand, the allowing studies presented posi- 
tive results in that teachers who were trained were able to 
effectively utilize IRI's. Kelly (1969) conducted the Berea, . 
Ohio In-service Educational Experiment to investigate the im- 
portance of training teachers for the purpose of discovering 
the relative effectiveness of an adopted model 'of the IRI In- : 
structional Process as a means of helping teachers become more 
cognizant of instructional reading 1 evel s in the classroom, 
and whether the time of the school year when the teacher par- 
ticipated In the in-service made a difference in the teacher's 
awareness of the instructional reading levels of students in 
the classroom. The results were: 

1. In terms of evaluative basal reading materials, teachers 
who participated in simulation type in-service programs 
early in the school year were more aware of the instruc- 
tional reading level s of students in the classroom. 

2. There was a considerable difference between the teachers 
who had participated in the simulation in-service pro- 
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gram later 1n the year and teachers who did not parti- 
cipate at all. 

3. Teachers who participated in the in-service program 
early in the year were significantly more aware of in- 
structional reading levels than teachers who did not 
participate. 

4. Primary school teachers were more aware of instructional 
reading levels in classrooms than intermediate school 
teachers (Kelly, 1969, p. 7). 

The reason was that elementary teachers, in contrast to 
intermediate teachers, have to teach reading and are aware of 
the process. Nonetheless, the implication here is obvious: 
teachers do need training in preparing, administering, and in- 
terpreting IRI's; but most Important, they need some background 
in the teaching of reading. 

Ladd (cited in Lowell, 1969) re-affirmed that teachers need 
training in reading instruction through a study that found that 
after 30 hours of intensive training, teachers were able to eval- 
uate the reading performance of students accurately. 

For the purpose of this study, to ensure that classroom 
teachers of bilingual students effectively used the GIRI , a 
teacher-training workshop was conducted. The focus of the work- 
shop was the reading process for bilinguals as well as adminis- 
tering, scoring IRI's, and prescribing remediation. 

While literature dealing with the relationship of the IRI 
and cioze tests used with English speakers is limited, litera- 
ture dealing with the t same relationship with no^-native speakers 
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is practically non-existent. Research has reported that cioze 
tests can be used as substitutes or partial substitutes for the 
IRI (Burmeister, 1974). This concept was elaborated by Bormuth 
(1967) who al so found that the comprehension scores of IRI ' s 
md cioze tests correlated significantly for English speakers. 
Similar research for non-English speakers was not found. How- 
ever, it was reported that the IRI was effective in determining 
reading levels of non-native speakers (Motta at ai . , 1974) and 
the cioze Test was an effective measure for determining the com- 
prehension levels of non-native speakers ( Jongsma , 1971; Aitken, 
1977; Oiler and Conrad, 1971; Oiler, 1972; Stubbs and Tucker, 
1974). This led the researcher to hypothesize that perhaps 
the comprehension scores of IRI 1 s and cioze tests would corre- 
late for non-native speakers. This hypothesis was based on 
Bormuth ' s study (1967), which will be discussed along with other 
studies reporting on the relationship of IRI and cioze tests. 

Bormuth (1967), the reading expert widely identified with 
cioze procedures, conducted three studies based on Taylor's 
work (1953). Bormuth* s first study examined the relationship 
of comprehension scores for multiple-choice and cioze tests, 
constructed for each of his nine passages. The cioze Test was 
given and within three days the multiple-choice tests were ad- 
ministered to 100 fourth and fifth graders. The reason for this 
time lapse is that the same passages were used for the multiple 
choice and cioze tests. When scores were collected and com- 
piled, a significant correlation was found between scores on 
the cioze and the multiple-choice tests. 
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For the purpose of this study, the. Bormuth study was rep 1 1 
cated. An original IRI consisting of eight passages was pre- 
pared. From the 6IRI stories, a cioze test and a multiple- 
choice test were prepared for each of the eight passages. 

In another study Bormuth (1969) attempted to examine fac- 
tors in the validity of the cioze test as a measure of reading 
comprehension. A series of passages were leveled according to. 
the Dale-Chair s Readability Formula, and a cloze test and 
multiple-choice IRI were constructed for each passage. The 
tests were administered to 150 fourth, fifth, and sixth graders 
Only the exact words emitted Were accepted as correct on the 
cioze Test. Conclusive results indicated that there was a 
high correlation between scores on the multiple choice and the 
Cloze Test. 

Bormuth ^1968) conducted another study to determine a set 
of criterion scores for cioze readability tests that would be 
comparable to the criterion scores used with oral reading tests 
to determine the readability of passages, and to further sub- 
stantiate that cioze tests are measu. ~s of comprehension. He 
tested 120 pupils of grades four, five, and six, using four 
forms of the Gray oral Reading Test (Robinson and Gray, 1963), 
each containing passages leveled from primer to high school 
level. He also administered two cioze tests. Each student 
was administered two cioze tests and an oral reading test, 
which tested word recognition and comprehension. The cioze 
test scores correlated highly with the comprehension scores of 
the oral test. Comparable criteria were determined with cioze 
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scores of 44% and 57% corresponding with comprehension criterion 
scores of 75-90% ; and Cloze "scores of 33-54% corresponding with 
word recognition scores of 95-98% . " 

Other evidence also indicates that there is a correlation 
between tht cioze Test and the ZRI (Wiechelman, 1971 ). Wiechel- 
man conducted a study comparing the functional reading levels 
identified with a cloze z\*st and the functional reading levels 
Identified with an IRI for 71 eighth-grade students, of which 
13 were Spanish-surnamed. The conclusions indicated that there 
was a positive relationship between the functional reading 
levels identified by the cloze Test and by the IRI. It was 
found that the mean functional reading levels through the use 
of the cioze Test for these students did estimate their mean 
functional reading levels as reported by the IRI. Also, the 
mean instructional reading levels *rom the cioze Test for Spanish- 
surnamed students did approximate their mean instructional reading 
levels on the IRI . Finally, when the instructional levels of 
the IRI and cioze tests were compared with the Durreii Listening 
Reading Test , results indicated that the IRI and cioze, tests were 
more accurate in identifying the reading levels. 

To demonstrate the effectiveness of the cioze Test with non- 
native speakers, Oiler and Conrad (1971) constructed a cioze Test 
and administered 1 t to 102 foreign students entering UCLA. Only 
the exact words omitted were counted as correct in the scoring. 
The students' were also administered the UCLA English as a second 

Language Placement Examination (ESLPE) as a basis of comparison 

with the cioze Test, The researchers found the highest corre- 
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lation batween the cloze Test and the Dictation sub-test (which 
measured listening comprehension), and the next highest corre- 
lation between the cloze rest- and reading. These correlations 
led the researchers to recommend the cloze Test as a good method 
for measuring language proficiency and comprehension levels, 
which can be used in placing non-native English speakers in 
English and reading classes. 

Research on the IRI's has also focused on the relationship 
between scores on IRI's and on standardized reading tests. It 
has been generally discovered that standardized reading tests 
tend to overestimate the students' functional reading levels, 
and IRI's are more accurate measures for placing students at 
their appropriate reading levels. 

Betts (1940) attempted to study the accuracy of standard- 
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ized measures as compared to informal procedures for assessing 
reading grade placement. He administered five silent reading 
tests: the Gates Reading Survey, the Stanford Achievement Test, 
the Durr ell-Sullivan Reading Achievement Test, the Sangren- 
Woody Reading Test, and the Iowa Silent Reading Test — Advanced . 
These were used to test fifth graders, and their scores were 
compared with the scores on the author's constructed IRI. 
Generally, Betts found the IRI to be a more accurate mea- 
sure of reading levels. 

Sipay (1964) studied the levels of reading achievement as 
measured by standardized reading tests and those levels deter- 
mined by an IRI. He administered three standardized tests and 
two parallel forms of an IRI, and concluded that the three stan- 
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dardlzed tests tended to overestimate the instructional level 
of students one or more grade levels. Only two of the stan- 
dardized tests, the Metropolitan Reading Test and the Gates 
Reading survey, appeared to indicate the instructional reading 
levels when a more stringent criterion was applied. 

McCracken (1962) has recommended the use of I R 1 1 s rather 
than standardized group reading tests for the purpose of ob- 
taining instructional reading levels. He compared the perf or7 
mance of 56 sixth-grade pupils on the iowa Every-Pupil Test of 
Basic* Skills, Test A: Silent Reading Comprehension to the 
reading comprehension and vocabulary scores on an IRI , which, 
included oral and silent reading. The three levels of perfor- 
mance on the IRI were .the Immediate Instructional Reading Level, 
the Maximum Instructional Reading Level, and the Word Recogni- 
tion Level. Results indicated that the average difference be- 
tweeen the iowa Reading comprehension grade levels and those 
estimated by the IRI was 2.3 years. The difference for the 
Vocabulary Grade Score was one year, and was higher for the 
iowa Test. McCracken (1962) concluded that the use of the 
standardized test scores to determine the level of instruction 
would place 63% of the students at a Frustration Reading Level. 
He suggested that the instructional level be two grades below 
the standardized test scores. McCracken ' s suggestion, has validity 
only for the Iowa Every-Pupil Te*v of Basic Skills, which he 
used in his study, and the reading materials that formed the 
basis of his IRI. 
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Williams (cited 1n McCracken, 1962) compared the performance 
of fourth, fifth, and sixth graders on an IRI based on'their 
classroom basal readers wt th their scores on the California 
Reading Test, the Gates Reading Survey , and the Metropolitan 
Achievement Reading Tests: j Reading. When an IRI, containing 
selections from basal readers which the students were familiar 
with, was used, the standardized tests were found to place stu- 
dents relatively near their instructional levels. 

Brown (1963) designed a study to determine if a difference 
existed in the instructional reading. levels as indicated by an 
IRI and the grade placement reading scores of five standard- 
ized reading tests. She administered the California Reading 

Test, the Metropolitan Reading Test, the Stanford Reading Test, 
the Iowa Test of Basic Skills, the Gates Reading Test, and an 

IRI to 192 elementary school children. After all the tests 
were administered, a second IRI was given to 49 of the stu- 
dents for the purpose of establishing the reliability of the 
IRI. When teachers were asked to estimate students' reading 
levels, it was found that their prediction correlated only with 
the IRI. Finally, results on the two IRI * s correlated highly; 
and no significant difference was found between pupils' scores 
on the two forms. 

.. Sipay (1964) attempted to obtain evidence on the extent 
to which the level of reading achievement, as measured by 
standardized reading achievement test scores, differed from 
functional reading levels, as estimated by an author-constructed 
IRI. He administered the Metropolitan Achievement Test: Reading, 
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the Gates Reading Survey, and the California Reading Test to 202 
subjects from 8 fourth-grade classes. The students were given 
individually administered IRI.'s, which were based on selections 
from the Scott Foresman Reading Series, The criteria for deter- 
mining the funct4onal reading levels were as follows: for the 
Instructional Level: Cooper Cri teria—96% with Word Pronuncia- 
tion 96-99%, and Comprehension—minimum 60%; for the Betts Cri- 
teria--^* with Word Pronunciation 90-95%; and Comprehension- 
minimum 60%; for the Frustration Level: Word Pronunciation less 
than 90%, and Comprehension—minimum of 50% or less. The sta- 
tistical analysis of the test scores indicated that: 

1. In estimating with the Cooper Criteria, the Instructional 
Level indicated by all three standardized tests tended 

to overestimate the instructional level by approximately 
one or more grades. 

2. When the Betts Criteria were used, the mean score of the 
Metropolitan Test was .11 grade levels higher, while 
the Gates Reading survey overestimated the Betts Cri- 
teria Instructional Level by .29 of the grade level, 

and the mean Of the California Reading Test was 1.02 
higher than that of ^ e Betts Criteria Instructional 
Level . 

3. The standardized tests, when compared with Frustration 
Level Criteria, were significantly lower in the case 

Of the Metropolitan and Gates tests. 

4. A comparison of the means of the Frustration Level and 
the California Test revealed that the California Reading 
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Survey underestimated the Frustration Level by .24 of 
the grade level. These differences were significant 
at the .05 level . 

In conclusion, S1pay (1964) stated that these findings 
suggest that is is Impossible to generalize as to whether stan- 
dardized reading achievement test scores tend to indicate the 
Instructional or Frustration levels; rather, it appears that irk 
making such a judgment one must consider the standardized 
reading test used and the criteria employed to estimate the 
functional reading levels. 

In a study of the relationship of pupils' scores from IRI's 
and standardized tests, Glaser (cited in McCracken, 1962) com- 
pared the functional reading levels of retarded seventh grade 
and advanced third-grade students to their scores on the Gates 
Reading survey. All the students in both groups had scored be- 
tween 5.0 and 5.9 on the Gates Redding Survey. The result was 
that standardized reading tests tended to overestimate reading 
levels. The findings of this survey were: 

1. The Instructional Levels of the advanced and retarded 
readers were consistently lower than the levels of 
their standardized reading test scores with a slightly 
larger spread evident for retarded readers. 

2. Sixteen (52%) of the retarded seventh-grade readers 
reached Frustration Level in passages of fifth grade 
difficulty. Seventeen (50%) of the third-grade pupils 
met the criteria for Frustration at this level. 
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3. The Instructional Levels were consistently below the 
standardized reading test scores for the two groups. 

4. Providing reading instruction and materials for stu- 
dents on "the basis of standardized reading test scores 
could hinder their progress and possibly affect their 
attitude toward reading. 

Leibert (cited ^n McCracken, 1962) compared the scores of 

IRI's and the Gates Advanced Primary Reading Test for Seconal 

graders. Leibert reported differences in grade placement for 
the two measures but suggested that these differences may, be 
due to the wider range of skills included in a group standard- 
ized test, while reading as measured by the IRI is more narrowly 
defined . 

Patty (1965) contrasted scores on the Gilmore Oral Reading 
Test and the Gray oral Reading Test with the IRI performance. 
Patty found that it was impossible to generalize as to whether 
standardized oral reading tests indicate the functional reading 
levels of children as accurately as IRI's do. Because of the 
economy of administration and the usefulness of the information 
they provided, the Cray oral Reading Test and the IRI were 
deemed the most desirable instruments for determining func- 
tional reading levels. 

Brown (1963) came to a similar conclusion in a study using 
the following silent reading tests: the California Reading Test, 
the Metropolitan Achievement Test: Reading, the Stanford Achieve 
ment Test: Reading , the Iowa Every-Pupil Test of Basic Skills, 

and the Gates Reading survey. Brown found no consistent relation 
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ship between performance on these tests and on informal inven- 
tories. The Brown and Patty studies are not directly comparable; 
Brown used standardized silent reading tests, while Patty used 
standardized oral reading tests. 

Bote! (1969) confirmed that standardized reading tests have 
not been accurate in indicating at what level the student should 
be reading. As a result, 10 to 15 million students in United 
States schools are reading books that are beyond their instruc- 
tional levels. Botel (1961) explained that he and other reading 
authorities agreed that the reading of children forced to read 
books that are too difficult for them is affected negatively. 

Farr (1969) contended that standardized test scores were 
almost useless for the diagnosis of students' instructional 
reading levels in the classroom. More specifically, Burmeister 
(1974) asserted that standardized silent reading tests have 
little diagnostic value because only a limited number of any 
one type of items were included in any one form of these tests:, 
hence, they provided the teacher only with a grade or pencentile 
score that ranks students according to norms for a given popu- 
lation. The scores obtained from these silent reading tests by 
students above the primary grades tended to overestimate reading 
levels and ^lace students at their frustration levels. A tea-, 
cher has to judiciously subtract a year from each student's total 
score to estimate the student's instructional levels and two 
years to estimate the student's independent level. 

Smith (1970) compared the results for fourth-, fifth-, 
and sixth-grade subjects from the administration of the Vocabu- 
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lary and* Comprehension sub-tests of the cites Haeainitie Reading 
Tests, survey d, The Vocabulary Reading and Paragraph Reading 
sub-tests of the Durre.ll Listening and Reading Series, Interme- 
diate Level with those of an XRZ. The findings demonstrated 
that for all three grade levels there were no statistically 
significant differences between the mean grade scores obtained 
from the standardized tests and the mean instructional reading 
levels obtained from the IRI. However, the Gates MacGinitie 
and Durreii tests placed more than one-half and one-third of 
the subjects, respectively, from one to five years above their 
actual instructional reading levels. Neither the sub-tests nor • 
the total score from the two standardized tests was accurate 1n 
estimating the students' instructional reading levels for pupils 
of grades four, five, and six. 

A study was conducted by Feldman et ai . (1971) to find if 
results from different measures could be used for various pur- 
poses, such as placing the students in groups , or for selecting 
appropriate reading material according to their reading levels. 
The^y tested 96 children of grades one to three with two stan- 
dardized measures: the New York Growth in Reading, the Metro- 
poiitfn Achievement Test, and two non- s tandardi zed tests-- the 
"Harri\ Sample Graded Word List," and "Graded Basal Readers 
Test" used in the students' reading program. The results in- 
dicated th\t the students' scores were higher on the standard- 
ized tests, which tended to overestimate their functional reading 
levels in the Vlassroom; and the IRI's seemed better suited for 
measuring the instructional reading levels in the classroom. 

\ 
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The four tests appeared to measure different skills; the non- 
standardized measures tested specific sight vocabulary, while 
the standardized tests measured global reading skills. 

Wade (1971) tested 77 eighth-grade students to compare 
their reading scores on the sates M&cainitie Reading rest, 
Survey E, and the Durrell Listening Series, Advanced Level, 
Fork d-e 9 and an IRX. The mean grade scores obtained from the 
sub-test and from the total test scores of the standardized 
reading tests tended to overestimate the instructional reading 
levels for these students by two and a half years. 

O'Brien (1973) also contended that standardized tests mea- 
sured only global reading levels, while IRI's provided the basis 
for estimating Independent, Instructional, and Frustration 
Reading Levels. Scores on standardized tests gave the teacher 
information about a group of students in relation to age-grade 
norms, but the scores did not delineate the nature of the stu- 
dent's problems in reading. On the other hand, scores on the 
IRI provided the teacher with specific information on the stu- 
dents' ability to atta '' unknown words, their ability to read 
with comprehension, and the levels at which the students were 
capable of performing in these two areas. 

From these studies it can be surmised that standardized 
tests are not the best measures for assessing students' func- 
tional reading levels. However, the problem is further com- 
pounded if the reading levels of bilingual students are assessed 
using standardized reading tests. First of all, norms may have 
been established w* th native English speakers. Secondly, items 
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on these tests may be culturally biased in that the tests pre- 
suppose certain visual and auditory skills for which the stu- 
dents may not have received adequate training. Thirdly, bi- 
lingual students are made to feel deficient 1n tests of un- . 
familiar lexical items and syntactical structures. 

Motta et ai. (1974) recommended the use of XRI's with ESI 
and bilingual students because they claimed that standardized 
tests did not take Into account such factors as socio-economic 
status, motivation, culture, nor the psycho! inguistic experience 
of these students. In their study, they prepared two sets of 
graded passages and corresponding comprehension questions for 
these paragraphs. One set of passages were administered indi- 
vidually and the other as a group test to non-English-speaking 
adults. Both tests successfully indicated that students' func- 
tional reading levels. They advised that to further estimate 
the student's literacy in his/her own language, an I R I in the 
student's native language should also be administered. 

PLAN AND PROCEDURE OF THE STUDY. 

Sample Population 

Fifty bilingual students of predominantly English and 
Hispanic-language backgrounds (with the exception of two 
Italian children and one Lebanese child) were randomly se- 
lected for this study. The subjects, lived in an area that in- 
cluded people of low and middle socio-economic (SES) back- 
grounds. They attended a city public school, which provided 
them with a bilingual proj^am of Spanish and English and an 
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ESL program. All students were instructed in the content area 
subjects in Spanish until they gained language proficiency in 
English. As soon as the students were capable of functioning 
at grade level in English, they were removed from the bilingual 
and mainstreamed into the English curriculum. 

Instruments 

1. A Group Informal Reading Inventory (GIRI) was con- 
structed as a set of eight passages based on original 
themes. The passages ranged in length from 30-200 

words. The Fry and Dale-Chall Readabilities were 
used to level the passages from primer to eighth 
grade. Ten multiple-choice comprehension questions 
were constructed for each passage based on the Barrett 
(cited in Lapp and Ramsey, 1976) and Krathwohl (1964) 
Taxonomies. 

The Informal Reading Inventory ( IRI ) was administered 
as a group test. To score the test, a percentage was 
determined and compared to the following criteria in 
order to determine the comprehension reading levels: 
(a) Independent: 90-100%, (b) Instructional: 75-89%, 
and (c) Frustration: 0-74%. 

2. A Cloze Test was constructed by deleting every seventh 
word. The first and last sentence of each passage 
were left intact. The test was administered as a 
group test and students were asked to fill in the 
blanks. In this test an answer was considered correct 
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as long as it was appropriate contextually. To score 
the cloze Test, Borrauth 1 s criteria (1967) was used: 
(a) Independent Level: 50-57%, (b) Instructional 
Level: 35-46%, and (c) Frustration Level: 19-31%. 

3. The Stanford Diagnostic Test (1976): Red Cgrades 1-3), 
Green (grades 3-5), and Brown (grades 5-8) levels were 
used to examine literal and inferential reading compre- 
hension. 

4. A questionnaire entitled "Teacher Estimate of the Stu- 
dents* Reading Levels" was used. Bilingual students' 
names were listed and the teacher was asked to esti- 
mate the readii.g level (grade) at which each student 
was functioning. (This was considered the instructional 
reading level . ) 

Procedure 

After the population was selected, arrangements were made 
with the bilingual coordinator of the school to conduct a teacher- 
training workshop and to administer the tests. The teacher- 
training workshop conducted for six bilingual teachers included: 
(a) a presentation and explanation of the construction and 
purpose of the tests to be administered; (b) a practice admin- 
istration, scoring, and evaluation of the tests among the tea- 
chers; (c) a discussion and planning of the group testing with 
bilingual students; and (d) a completion of the "Teacher Esti- 
mate of the Students 1 Reading Levels" questionnaire. The 
arrangements were that the cioze Test would be administered 





initially; the IRI would follow within three days, as exemplified 



in the Bormuth study (1967); then, the Stanford Diagnostic rest 



(1976) would be administered. 

Summary and Analysis of Data 

Data concerning the demographics and results from the assess- 
ment measures administered to bilingual students are summarized 
in Table 1. 

The scores from the Stanford Diagnostic Test (1976) are 
an average of the 1 iteral and inferential comprehension results 
for the red, green, and brown levels, respectively. For desci p- 
tive purposes, the students' report card grades on Reading in 
the Native Language (N.L. ) and English as a Second Language 
(ESL) were also included. 

Close inspection of Table 1 demonstrates that this random 
sample of bilingual students is functioning not only far below 
their developmental grade level but also below their assigned 
present grade level. Interestingly enough, this fact holds 
true even for students who have been exposed to English reading 
instruction up to five years. Clearly then, the original argu- 
ment that there is a pressing need to assess these students' 
functioning reading levels as early as possible and to match 
instruction and materials to the students' reading needs is 
valid. 

In noting the students' reading grades, those in the native 
language appear to be much lower than those in English as a 
Second Language. A possible reason for this'is that students 
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TABLE 1 



DEMOGRAPHICS AND RESULTS FROM ASSESSMENT MEASURES 
ADMINISTERED TO BILINGUAL STUDENTS 



Instructional Estimated Grade Level 







Grade 
Student 
Shoul d 
Be In 


Present 
Grade 


Yea rs 

in 
U.S. 


Teacher 
Estimate 






Stanford 


Previous 


fi ra H p ^ 

w I QUC9 


Subject 


Age 


CI oze 


GIRI 




Level 


N L 


ESL 


1 


14 


9 


8 


5 


8 


5 


7,8 


0.0 


ic 


B 


A 


2 


13 


8 


7 


2.5 


5-5 


1 


5 


0 




A 


A 


3 


14 


9 


7 


3 


1-2 


P 


4 


? 7 


y 

A 


D 


B 


4 




9 


8 


3 


3-5 


• 

1 


4 


3.6 


J 




A 


5 




9 


7 


5 


3-5 


5 


4 


3.7 




B 


A 


6 


13 


8 




3 


1-2 


.4 


6 


4.8 


X 




B+/A 


7 


15 


10" 


8 


5 


3-5 


4-5 


5 


4.5 


* 


C 


B+/A- 


8 


14 


9 


8 


3.5 


3-5 


3 


4 


4 


v/ 


C 


A 


9 


13 


8 


7 


4 


1-2 


1 


4 


2.9 






B 


10 


15 


10 


8 


2 


1-2 


2. 


7,8 


4.5 




A 


A 



Key: 



N.L 

ESL 

P 
★ 

X 



= Native Language--Readi ng 

- English as a Second Language 
* Primer Level 

= Stanford—Brown Level 

- Stanford- -Green Level 

■ Stanford--Red and Green Level 



are heterogeneously grouped 1n the native language, and thus 
each student has to compete with students of all ability level 
whereas, for English as a Second Language, students are homo- 
geneously grouped competing only with students who normally 
function at their own level. 

Contradictory to the above, both native- and second- 
language ^teachers tend to underestimate the students' reading 
levels when compared with the results of reading tests. This 
factor could be related to the students' poor performance in 
reading—the self-fulfilling prophecy — teachers do not feel 
the students can do well, and as a result, students do not do 
well. Ironically, ESL teachers assign M A's" and "B's" to the 
students on the report cards. The question is "why?" If the 

« 

rationale is to help the students* self-concept, then this is 
acceptable. However, it cannot be overlooked that they may 
simply be complying to the educational system's social pro- 
motion policy, in spite of the fact that they are aware of. the 
students* low reading levels. In this case, the students are 
deluded in believing that they are achieving and may not be 
prepared in terms of reading skills to compete with the world 
in and out of school . 

IMPLICATIONS FOR TEACHING 

Clearly, it is the case that when junior high to high 
school bilingual students are functioning five or more years 
below grade level, something has to be done about instruction. 
The danger is obvious if instruction in the native language 
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and English is not presented as a relevant experience; these 
students become frustrated and eventually drop out of school. 
Teachers must consciously deal with the students *. reading needs 
and not gloss over these needs by Issuing good grades. To deal 
with these needs, teachers must meet the students' (a) academic 
needs by providing high interest, low level materials and (b) 
the students* interests by providing materials and skills that 
will prepare them for whatever endeavor they seek in the out- 
side world whether it be higher education, academic, or a 
working vocation. 

The possible guidelines for meeting student needs are: 

1. Language medium--A decision has to be made whether 
the student 1s to Be instructed in two languages or whether 
it may be too late for the student who does not have native 
language competencies to get the skills in the native language 
and transfer thsse. ESL. may be the solution in this case be- 
cause if the student is frustrated, he/she will drop out of 
school altogether. 

2. Assessment- -Pre tests and post tests (informal and 
formal measures) must be administered on a continuing basis 
so that instruction can constantly be adjusted to the stu- 
dents* reading and content area, strengths, and weaknesses. 

3. Instructlon--(l) Time to be allotted in the two lan- 
guages; (b) Technique to be employed — whole group instruction, 
individualized. (The latter would be the most beneficial in 
placing part of the responsibility of instruction to the stu- 
dent. With the issuing of performance contracts, the student 
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would then be accountable for his/her success and failure,); 
(c) content relevant, functional instruction. For example, 
learning a driver's manual, filling out job applications, or 
relevant information that the student needs to deal with in 
his/her English environment; (d) Materials commerical and 
teacher- or student-made, high interest, low reading level; 
(e) Evaluation of the student, instruction, program, and 
changes made accordingly. 
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APPENDIX 



An Illustrative Story from the Informal Reading Inventory 
(In Cloze and Multiple-Choice Versions) 



STORY HI 

"Luciano, wake up! Have you forgotten what today is?" 

whispered mother* I opened my sleepy eyes. I 

that today my uncle and his would be arriving 

from Italy. I as I thought of the food 



fun I would have. Then I . to my mother. She 

smiled and me. I then said, "Mother, can 



take my cousins to the Feast St. Anthony?" Mother 

wrinkled her forehead answered, "You must not go 

alone. . go together." 

I jumped out of , and dressed quickly. I wore 

the suit that mother had bought for 

special day. I ran out to ; kitchen. Anna was 

crying, as she did when Mother combed and braided 

hair. Father and Grardfather sat at 

table drinking expresso coffee as they \ of the olden 

days in Italy. liked to list'en, because I had 

been there. Grandfather always said, "America a 

rich country today, but Americans not have the 

respect in families we Italians have." Grandfather 

said, "A is important. He knows the way 

should be in the family." As talked, Grandfather 

noticed me and said, " , Luciano, show your grandpa 

what a grandson he has. You will be a fine man and 

a papa someday too!" gg 



STORY III 

c 

"Luciano, wake up! Have you forgotten what today is?" 
whispered mother. I opened my sleepy eyes. I remembered that 
today my uncle and his family would be arriving from Italy. 
I smiled as I thought of the food and fun I would have. Then 
I turned to my mother. She smiled and kissed me. I then said, 
"Mother, can I take my cousins to the Feast of St. Anthony?" 
Mother wrinkled her forehead and answered, "You must not go 
alone. We'll go together." 

I jumped out of bed and dressed quickly. I wore the new 
suit that mother had bought for t hi s special day. I ran out 
to the kitchen. Anna was crying, *s she always did when Mother 
combed and braided her hair. Father and Grandfather sat at the 
table drinking expresso coffee as they talked of the olden days 
in Italy. I liked to listen, because I had never been there. 
Grandfather always said, "America is a rich country today, but 
Americans do not have the respect in families that we Italians 
have." Grandfather said, "A papa is important. He knows the 
way things should be in the family." As he talked, Grandfather 
noticed me and said, "Come, Luciano, show your grandpa what a 
fine grandson he has. You will be a fine man and a papa some- 
day too!" 

i 
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STORY HI — QUESTIONS 



Who was coaling to visit Luciano? 

a. his uncle and his family 

b. his father and grandfather 

c. his sister and mother 

d. his sister and her family 

Where did Luciano want to take his cousins? 

a. to the block party 

b. to the neighborhood fiesta 

c. to the feast of St. Anthony 

d. to the amusement park 

What did Luciano's father and grandfather drink? 

a. wine 

b. coca-cola 

c. tomato juice 

d. expresso coffee 

Why did Luciano like to listen to Grandfather? 

a. He wanted to go to Italy. 

b. He had never been to Italy. 

c. He liked Italy better than America. 

d. He wanted to live in Italy. 

What did Luciano do after he jumped out of bed? 

a. He went to the kitchen. 

b. He listened to his grandfather. 

c. He dressed. 

d. He talked to his mother. 

Why do you think Anna was crying? 

a. Her grandfather spilled coffee on her. 

b. She fell off a tree. 

c. She did not want to eat. 

d. She was in pain. 

How do you know that Grandfather missed Italy? 

a. He wanted to go back to Italy. 

b. He always talked about Italy. 

c. He had relatives in Italy. 

d. He was rich in Italy. 

How do you know that Mother was concerned about 
Luciano? 

a. She did not want him to go to the feast by 
himsel f . 

b. She always bought him new suits. 

c. She did not want him to have a good time. 

d. She let him do whatever he pleased. 
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According to Grandfather, what is the difference 
between Italians and Americans? 

a. Americans are poorer than Italians. 

b. Italians drink more expresso than Americans. 

c. Italians have better houses than Americans. 

d. The people of the two countries have different 
family values. 

How does Grandfather feel about Luciano? 

a. He is proud of him. 

b. He is unhappy with him. 

c. He dislikes him. 

d. He feels that Luciano likes to fool around 
too much. 
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